[Bugfix][plugin] fla crash on plugin#27322
Conversation
There was a problem hiding this comment.
Code Review
This pull request addresses a crash in Flash-Linear-Attention (FLA) operations when used with plugins. The changes in vllm/model_executor/layers/fla/ops/utils.py are well-reasoned and effective. By leveraging current_platform.is_cuda_alike(), the code now correctly identifies CUDA-compatible platforms (including plugins) and sets the device library appropriately. Adding a None default to getattr is a good defensive measure that prevents crashes on other platforms like CPU, making the utility more robust. The fix is correct and improves the overall stability of FLA operations in diverse environments.
NickLucche
left a comment
There was a problem hiding this comment.
I think this looks fine but I don't have the context on fla.
Perhaps @youkaichao can take a quick look at it.
873d0a9 to
b483cc9
Compare
mgoin
left a comment
There was a problem hiding this comment.
Looks simple enough to me. I believe the logic is kept the same for Nvidia and AMD, so nothing else changes for Intel or CPU
Signed-off-by: Hank <hcc.mayday@gmail.com>
59e4598 to
e379c89
Compare
related: vllm-project/vllm/pull/27322 Signed-off-by: Hank <hcc.mayday@gmail.com>
* support platform and remove kernel copy Signed-off-by: Hank <hcc.mayday@gmail.com> * update pre-commit Signed-off-by: Hank <hcc.mayday@gmail.com> * update version and requirements Signed-off-by: Hank <hcc.mayday@gmail.com> * update flashinfer Signed-off-by: Hank <hcc.mayday@gmail.com> * update build requirements Signed-off-by: Hank <hcc.mayday@gmail.com> * update attention backends Signed-off-by: Hank <hcc.mayday@gmail.com> * update patch Signed-off-by: Hank <hcc.mayday@gmail.com> * update quant_method Signed-off-by: Hank <hcc.mayday@gmail.com> * update fuse_moe (todo: fix mypy) Signed-off-by: Hank <hcc.mayday@gmail.com> * update `deepseek_v2.py`(todo: fix indexer kernel) Signed-off-by: Hank <hcc.mayday@gmail.com> * [feat] support bf16 cp_gather_indexer_k_cache kernel Signed-off-by: Xin Li <lixin1620@gmail.com> * [fix] fix type error in bf16_paged_mqa_logits Signed-off-by: leex404 <lixin1620@gmail.com> * [feat] add topk logits ops Signed-off-by: leex404 <lixin1620@gmail.com> * [fix] private memory size too large in `sample_recovered_tokens_kernel` (#115) * [fix] fix sample_recovered_tokens_kernel use too much private memory Signed-off-by: Xin Li <xin.li@metax-tech.com> * [fix] fix type error in bf16_paged_mqa_logits Signed-off-by: Xin Li <xin.li@metax-tech.com> * [chore] change file directory Signed-off-by: Xin Li <xin.li@metax-tech.com> --------- Signed-off-by: Xin Li <xin.li@metax-tech.com> Co-authored-by: Xin Li <xin.li@metax-tech.com> Signed-off-by: leex404 <lixin1620@gmail.com> * [fix] fix missing topk logits custom ops definition Signed-off-by: leex404 <lixin1620@gmail.com> * [fix] add custom gptq_shuffle ops Signed-off-by: leex404 <lixin1620@gmail.com> * [fix] fix compile error Signed-off-by: leex404 <lixin1620@gmail.com> * platform config update Signed-off-by: Hank <hcc.mayday@gmail.com> * update qwen2.5_vl model Signed-off-by: Hank <hcc.mayday@gmail.com> * [fix] fix torch not found maca device Signed-off-by: leex404 <lixin1620@gmail.com> * remove hotfixes patch for torch2.8 Signed-off-by: Hank <hcc.mayday@gmail.com> * remove needless patch related: vllm-project/vllm/pull/27322 Signed-off-by: Hank <hcc.mayday@gmail.com> * [feat] topk_softmax support renormalize and bf16 Signed-off-by: leex404 <lixin1620@gmail.com> * [fix] update fused_moe to fit v0.11.1 Signed-off-by: leex404 <lixin1620@gmail.com> * [fix] fix fused moe config log missing Signed-off-by: leex404 <lixin1620@gmail.com> * use flash_attn as vit attn backend on qwen_vl Signed-off-by: Hank <hcc.mayday@gmail.com> * update quant_conf registry Signed-off-by: Hank <hcc.mayday@gmail.com> * fix and apply latest pre-commit of v0.11.1 Signed-off-by: Hank <hcc.mayday@gmail.com> * [feat] Keep all AITER kernels in _aiter_ops Signed-off-by: leex404 <lixin1620@gmail.com> * fix pre-commit on type casting Signed-off-by: Hank <hcc.mayday@gmail.com> * [fix] fix DeepSeek import error Signed-off-by: leex404 <lixin1620@gmail.com> * [feat] update deepseek_v2 to fit v0.11.1 Signed-off-by: leex404 <lixin1620@gmail.com> --------- Signed-off-by: Hank <hcc.mayday@gmail.com> Signed-off-by: Xin Li <lixin1620@gmail.com> Signed-off-by: leex404 <lixin1620@gmail.com> Co-authored-by: Xin Li <xin.li@metax-tech.com> Co-authored-by: leex404 <lixin1620@gmail.com> Co-authored-by: leex404 <42941760+leex404@users.noreply.github.com>
Purpose
There's some problem while supporting fla on plugin.
While importing the
fla/ops/utils, it crashed on here.Since in plugin the
devicemight got their own value (here in vllm-metax ismaca) anddevice_torch_libstill need to be their own. (here in vllm-metax istorch.cuda)So I use is_cuda_alike and set default value to None on
getattrto make some compatibilities for handling the corner cases. The semantics should be consistent with the original code.Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.